openai pendulum
r/MachineLearning - [P] A2C not working in OpenAi Pendulum
I've been spending weeks trying to get an actor-critic reinforcement learning model to work with the OpenAi Pendulum environment, but I haven't been able to solve it, yet. The critic (value) model is predicting the value well and its loss is low. The actor, however, is predicting actions all over the place with it's mean (mu) and variance (sigma) totally not aligned with what they should be. If I limit the mean using a tanh activation then the sigma will keep going up towards infinity. I've tried different activation functions, initializers, and hyper-parameters, but nothing seems to work.